Predicting pathway membership via domain signatures

نویسندگان

  • Holger Fröhlich
  • Mark Fellmann
  • Holger Sültmann
  • Annemarie Poustka
  • Tim Beißbarth
چکیده

MOTIVATION Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database. RESULTS We present a classification model, which for a specific gene of interest can predict the mapping to a KEGG pathway, based on its domain signature. The classifier makes explicit use of the hierarchical organization of pathways in the KEGG database. Furthermore, we take into account that a specific gene can be mapped to different pathways at the same time. The classification method produces a scoring of all possible mapping positions of the gene in the KEGG hierarchy. Evaluations of our model, which is a combination of a SVM and ranking perceptron approach, show a high prediction performance. Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components. AVAILABILITY The R package gene2pathway is a supplement to this article.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene2Pathway Predicting Pathway Membership via Domain Signatures

Functional characterization of genes is of great importance, e.g. in microarray studies. Valueable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the I...

متن کامل

Exploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach

Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...

متن کامل

Diagnosis of Coronary Artery Disease via a Novel Fuzzy Expert System Optimized by Cuckoo Search

In this paper, we propose a novel fuzzy expert system for detection of Coronary Artery Disease, using cuckoo search algorithm. This system includes three phases: firstly, at the stage of fuzzy system design, a decision tree is used to extract if-then rules which provide the crisp rules required for Coronary Artery Disease detection. Secondly, the fuzzy system is formed by setting the intervals ...

متن کامل

Evaluation of Optimal Fuzzy Membership Function for Wind Speed Forecasting

In this paper, a new approach is proposed in order to select an optimal membership function for inputs of wind speed prediction system. Then using a fuzzy method and the stochastic characteristics of wind speed in the previous year, the wind speed modeling is performed and the wind speed for the future year will be predicted. In this proposed method, the average and the standard deviation of in...

متن کامل

Association Between a Prognostic Gene Signature and Functional Gene Sets

BACKGROUND The development of expression-based gene signatures for predicting prognosis or class membership is a popular and challenging task. Besides their stringent validation, signatures need a functional interpretation and must be placed in a biological context. Popular tools such as Gene Set Enrichment have drawbacks because they are restricted to annotated genes and are unable to capture ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2008